NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD

https://doi.org/10.1101/2024.12.06.627191

Inan, Toki Tahmid; Kabir, Anowarul; Rasmussen, Kim; Shehu, Amarda; Usheva, Anny; Bishop, Alan; Alexandrov, Boian; Bhattarai, Manish (December 2024, bioRxiv)

Abstract DNA breathing dynamics—transient base-pair opening and closing due to thermal fluctuations—are vital for processes like transcription, replication, and repair. Traditional models, such as the Extended Peyrard-Bishop-Dauxois (EPBD), provide insights into these dynamics but are computationally limited for long sequences. We presentJAX-EPBD, a high-throughput Langevin molecular dynamics framework leveragingJAXfor GPU-accelerated simulations, achieving up to 30x speedup and superior scalability compared to the original C-based EPBD implementation.JAX-EPBDefficiently captures time-dependent behaviors, including bubble lifetimes and base flipping kinetics, enabling genome-scale analyses. Applying it to transcription factor (TF) binding affinity prediction using SELEX datasets, we observed consistent improvements inR²values when incorporating breathing features with sequence data. Validating on the 77-bp AAV P5 promoter,JAX-EPBDrevealed sequence-specific differences in bubble dynamics correlating with transcriptional activity. These findings establishJAX-EPBDas a powerful and scalable tool for understanding DNA breathing dynamics and their role in gene regulation and transcription factor binding.
more » « less
Full Text Available
Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models

https://doi.org/10.1101/2024.12.06.626709

Kabir, Anowarul; Inan, Toki Tahmid; Rasmussen, Kim; Shehu, Amarda; Usheva, Anny; Bishop, Alan; Alexandrov, Boian; Bhattarai, Manish (December 2024, bioRxiv)

Abstract Simulating DNA breathing dynamics, for instance Extended Peyrard-Bishop-Dauxois (EPBD) model, across the entire human genome using traditional biophysical methods like pyDNA-EPBD is computationally prohibitive due to intensive techniques such as Markov Chain Monte Carlo (MCMC) and Langevin dynamics. To overcome this limitation, we propose a deep surrogate generative model utilizing a conditional Denoising Diffusion Probabilistic Model (DDPM) trained on DNA sequence-EPBD feature pairs. This surrogate model efficiently generates high-fidelity DNA breathing features conditioned on DNA sequences, reducing computational time from months to hours–a speedup of over 1000 times. By integrating these features into the EPBDxDNABERT-2 model, we enhance the accuracy of transcription factor (TF) binding site predictions. Experiments demonstrate that the surrogate-generated features perform comparably to those obtained from the original EPBD framework, validating the model’s efficacy and fidelity. This advancement enables real-time, genome-wide analyses, significantly accelerating genomic research and offering powerful tools for disease understanding and therapeutic development.
more » « less
Full Text Available
Protein Decoy Generation via Adaptive Stochastic Optimization for Protein Structure Determination

https://doi.org/10.1109/BIBM49941.2020.9313102

Zaman, Ahmed Bin; Inan, Toki Tahmid; Shehu, Amarda (December 2020, IEEE Intl Conf on Bioinformatics and Biomedicine (BIBM))
null (Ed.)
Full Text Available
Adaptive Stochastic Optimization to Improve Protein Conformation Sampling

https://doi.org/10.1109/TCBB.2021.3134103

Zaman, Ahmed Bin; Inan, Toki Tahmid; De Jong, Kenneth; Shehu, Amarda (January 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics)

We have long known that characterizing protein structures structure is key to understanding protein function. Computational approaches have largely addressed a narrow formulation of the problem, seeking to compute one native structure from an amino-acid sequence. Now AlphaFold2 promises to reveal a high-quality native structure for possibly many proteins. However, researchers over the years have argued for broadening our view to account for the multiplicity of native structures. We now know that many protein molecules switch between different structures to regulate interactions with molecular partners in the cell. Elucidating such structures de novo is exceptionally difficult, as it requires exploration of possibly a very large structure space in search of competing, near-optimal structures. Here we report on a novel stochastic optimization method capable of revealing very different structures for a given protein from knowledge of its amino-acid sequence. The method leverages evolutionary search techniques and adapts its exploration of the search space to balance between exploration and exploitation in the presence of a computational budget. In addition to demonstrating the utility of this method for identifying multiple native structures, we additionally provide a benchmark dataset for researchers to continue work on this problem.
more » « less
Full Text Available

Search for: All records